NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Applying large language models to sanitize self-disclosure in user-generated content

https://doi.org/10.1016/j.asoc.2025.113311

Alfieri, Costanza; Scoccia, Gian Luca; Ganesh, Surya; Sadeh, Norman (June 2025, Applied Soft Computing)

The rise of e-commerce and social networking platforms has led to an increase in the disclosure of personal health information within user-generated content. This study investigates the application of large language models (LLMs) to detect and sanitize sensitive health data shared by users across platforms such as Amazon, patient.info, and Facebook. We propose a methodology that leverages LLMs to evaluate both the sensitivity of disclosed information and the platform-specific semantics of the content. Through prompt engineering, our method identifies sensitive information and rephrases it to minimize disclosure while preserving content similarity. ChatGPT serves as the LLM in this study due to its versatility. Empirical results suggest that ChatGPT can reliably assign sensitivity scores to user-generated text and generate sanitized versions that effectively preserve the original meaning.
more » « less
Free, publicly-accessible full text available June 1, 2026
Generating Effective Answers to People’s Everyday Cybersecurity Questions: An Initial Study

https://doi.org/10.1007/978-981-96-0576-7_27

Balaji, Ananya; Duesterwald, Lea; Yang, Ian; Priyanshu, Aman; Alfieri, Costanza; Sadeh, Norman (November 2024, Springer Nature Singapore)

Human users are often the weakest link in cybersecurity, with a large percentage of security breaches attributed to some kind of human error. When confronted with everyday cybersecurity questions - or any other question for that matter, users tend to turn to their search engines, online forums, and, recently, chatbots. We report on a study on the effectiveness of answers generated by two popular chatbots to an initial set of questions related to typical cybersecurity challenges faced by users (e.g., phishing, use of VPN, multi-factor authentication). The study does not only look at the accuracy of the answers generated by the chatbots but also at whether these answers are understandable, whether they are likely to motivate users to follow any provided recommendations, and whether these recommendations are actionable. Surprisingly enough, this initial study suggests that state-of-the-art chatbots are already reasonably good at providing accurate answers to common cybersecurity questions. Yet the study also suggests that the chatbots are not very effective when it comes to generating answers that are relevant, actionable, and, most importantly, likely to motivate users to heed their recommendations. The study proceeds with the design and evaluation of prompt engineering techniques intended to improve the effectiveness of answers generated by the chatbots. Initial results suggest that it is possible to improve the effectiveness of answers and, in particular, their likelihood of motivating users to heed recommendations, and their ability to act upon these recommendations without diminishing their accuracy. We discuss the implications of these initial results and plans for future work in this area.
more » « less
Full Text Available
“I was Diagnosed with …”: Sensitivity Detection and Rephrasing of Amazon Reviews with ChatGPT

https://doi.org/10.1109/PST62714.2024.10788076

Alfieri, Costanza; Ganesh, Suriya; Ge, Limin; Shi, Jingxin; Sadeh, Norman (August 2024, IEEE)

The proliferation of platforms such as e-commerceand social networks has led to an increasing amount of personal health information being disclosed in user-generated content.This study investigates the use of Large Language Models (LLMs) to detect and sanitize sensitive health data disclosures in reviews posted on Amazon. Specifically, we present an approach that uses ChatGPT to evaluate both the sensitivity and informativeness of Amazon reviews. The approach uses prompt engineering to identify sensitive content and rephrase reviews to reduce sensitive disclosures while maintaining informativeness. Empirical results indicate that ChatGPT is capable of reliably assigning sensitivity scores and informativeness scores to user-generated reviews and can be used to generate sanitized reviews that remain informative.
more » « less
Full Text Available

Search for: All records